Using Containers on HPC Resources

Running Your Applications with Ease

Charles Peterson

2023-04-05

Learning Objectives 🎯

Welcome!

In this workshop, we will go over using containers on HPC resources, like UCLA’s Hoffman2

  • Understand the basics of containers 📚
  • Learn how containers can be used in HPC environments 💻
  • Explore the benefits of containerization 🚀
  • Get familiar with Apptainer and its workflow 🛠️
  • Discover best practices for using containers

Files for this Presentation 📁

  • Viewing the slides
  • To download the presentation and example files, run the following command
git clone https://github.com/ucla-oarc-hpc/WS_containers

Containers: The Basics

Containers: The Basics 📦

What Are Containers?

  • Consistency across platforms ✔️
  • Isolation from host system 🔒
  • Lightweight and portable ✈️

Containerizing Applications 🛠️

Containers allow you to:

  • Package applications with their dependencies
  • Easily deploy and run them consistently across different systems.

Transferring Containers 🚚

Containers can be:

  • Easily transferred between different HPC resources
  • Ensure consistent environment for your software

Understanding Virtualization 🖥️

  • Containers provide a lightweight, portable, and consistent environment across different platforms.

  • To Understand Containers, we will first talk about Virtualization

Matrix GIF

Types of Virtualization 📐

  1. Hardware Virtualization
    • Creates virtual machines with independent OS and resources on a single physical host.
    • Example: VirtualBox, VMware, AWS EC2
  2. Operating System Virtualization (Containers)
    • Allows multiple isolated user-space instances on the same OS kernel.
    • Example: Docker, Apptainer, Kubernetes
  3. Application Virtualization
    • Packages applications and their dependencies for execution on any compatible system.
    • Example: App-V, ThinApp, Turbo

Bare Metal Setup: No Virtualization 💻

  • ‘Bare metal’ refers to physical servers running directly on hardware without virtualization. 🔧
  • Software and applications are installed directly on the host operating system. 💿
  • Resources such as CPU, memory, and storage are dedicated and not shared with other virtual machines. 📊
  • Advantages: High performance, direct access to hardware, low overhead. 👍
  • Limitations: Less flexibility, limited isolation between applications, potential underutilization of resources. 👎
  • Software runs directly on OS from the physical hardware

  • Typical applications are in this fashion

    • Most module load software

Virtual Machines: Hardware-Level Virtualization 🖥️

  • Virtual machines (VMs) emulate physical computers and run multiple operating systems on a single host.
  • Each VM has its own virtual hardware, including CPU, memory, and storage. 💾
  • VMs are managed by a hypervisor (e.g., VirtualBox, VMware) that abstracts the physical hardware. 🎛️
  • VMs provide strong isolation between environments and are ideal for development, testing, and legacy applications. 🛡️
  • Limitations: Additional overhead due to full OS in each VM, performance may be affected by virtualization layer.
  • Applications running inside of a VM are running on a completely different set of (virtual) resources

  • A “Machine” within a “Machine”

OS Virtualization: Containers 🐳

  • OS virtualization with containers allows multiple, isolated user-space instances to run on a single host OS.
  • Containers share the host OS kernel but have their own file system, libraries, and dependencies.
  • Containers are lightweight, start quickly, and have lower overhead compared to VMs.
  • Containerization provides a consistent and reproducible environment across platforms.
  • Containers are ideal for microservices, cloud-native applications, and scalable deployments.
  • Applications running inside of a container are running with the SAME kernal and physical resources as the host OS

  • A “OS” within a “OS”

Why use Containers

  • Bring your own OS 🌎
  • Portability ✈️
  • Reproducibility 🔁
  • Design your own environment 🎨
  • Version control 📑

Challenges with Software Installation 🛠️

  • Researchers face difficulties in managing software installations:
    • Spend time setting up software on Hoffman2
    • Figuring out which versions and modules to load for dependencies
    • Having to wait for System Admins to help
  • Then start all over when using software on a different HPC resource

HPC resources (like Hoffman2) are SHARED resources 👥

  • Researchers are running software on the same computing resource
  • No ‘sudo’ and limited yum/apt-get commands available 🚫

Container Advantages

  • Install your application once:
    • Use on any HPC resource 🌐
  • A ‘virtual’ OS:
    • Users can have complete OS admin control 🔏

  • Great for easily installing software with apt/yum 📦

  • Great if your software requires MANY dependencies that would be complex installing on Hoffman2. ⛓️

Containerization vs. Traditional Deployment ⚖️

  • Traditional Deployment:
    • Software dependencies must be installed on the host system. 📁
    • Conflicts can occur between different software versions. ⚠️
    • Challenging to achieve consistent environments across platforms. 📉
  • Containerization:
    • Dependencies are packaged within the container. 🎁
    • No conflicts with the host system or other containers. ☮️
    • Consistent and reproducible environments on any platform. 📈

Software for Containers 🔧

Podman 📦

  • Similar syntax as with Docker
  • Doesn’t have root daemon processes
  • On some HPC resources (not on Hoffman2, yet) 🔜

Docker 🐳

  • One of the most popular containerization software
  • Many popular cloud container registries to store Docker containers:
    • DockerHub, GitHub Packages, Nvidia NGC
  • MPI over multiple servers not well supported 🚫
  • Most likely NOT available on many HPC systems (not on Hoffman2)

Apptainer

Apptainer 🚀

  • Formerly Singularity
  • Designed and developed for HPC systems 🖥️
  • Most likely installed on HPC systems (installed on Hoffman2)
  • Supports Infiniband, GPUs, MPI, and other devices on the Host
  • Can run Docker containers 🐋

Security considerations 🛡️

  • Built with shared user system environments in mind
  • NO daemon run by root 🚫
  • NO privilege escalation. Cannot gain control over host/Hoffman2 🔒
  • All permission restrictions outside of a container apply to the inside 🔐

Common Usage on Hoffman2 💡

To use Apptainer on Hoffman2, simply load the module:

module load apptainer
  • Only module you need to load!
    • Except for a MPI module if running parallel

Common Apptainer Commands:

  1. Getting a container from somewhere
apptainer pull [options]
apptainer pull docker://ubuntu:20.04
  1. Build a container
apptainer build [options]
apptainer build myapp.sif myapp.def

Common Usage Continued 🔧

Common Apptainer commands:

  1. Run a command within a container
apptainer exec [options] container.sif
apptainer exec mypython.sif python3 test.py
# Runs the command `python3 test.py` inside the container
  1. Start an interactive session inside your container
apptainer shell [options] container.sif
apptainer shell mypython.sif

Note

Apptainer will NOT run on Hoffman2 login nodes.

Apptainer Workflow for running on H2 🔄

  1. Create 🛠️

  2. Transfer ↪️

  3. Run ▶️

Apptainer Workflow (Create) 🛠️

1. Create 🛠️

  1. Transfer

  2. Run

  • Build a container
    • From Apptainer or Docker on your computer
    • Where you have root/sudo access
  • Use a pre-built container:

Apptainer Workflow (Transfer) ↪️

  1. Create

2. Transfer ↪️

  1. Run

Bring your container to Hoffman2:

  • Copy your container to Hoffman2
scp test.sif username@hoffman2.idre.ucla.edu
  • Pull a container from Container Register
apptainer pull docker://ubuntu:20.04
  • Use a container pre-built on Hoffman2
module load apptainer
ls $H2_CONTAINER_LOC

Apptainer workflow (Run) ▶️

Create

Transfer

Run ▶️

Run Apptainer on your container:

  • Can run in an interactive (qrsh) session
qrsh -l h_data=5G
module load apptainer
apptainer exec mypython.sif python3 test.py
  • Or run as a Batch (qsub) job

  • Create job script myjob.job

#!/bin/bash
#$ -l h_data=10G
module load apptainer
apptainer exec mypython.sif python3 test.py
  • Submit your job
qsub -l h_data=5G myjob.job

MAJOR TAKEWAY

  • Apptainer containers run like any other application.
  • Run the same commands as you normally would
    • Just add an Apptainer command to any command you want to run inside the container

So….

python3 test.py
R CMD BATCH test.R

Turns into

apptainer exec myPython.sif python3 test.py
apptainer exec myR.sif R CMD BATCH test.R

Examples

  • Example 1: Simple containers with TensorFlow
  • Example 2: GPU containers with PyTorch
  • Example 3: Parallel MPI containers

You can find the workshop material here:

git clone https://github.com/ucla-oarc-hpc/WS_containers

Example 1: TensorFlow (1) 🧠

-This example will use Tensorflow

  • Great library for developing Machine Learning models
  • We will use the MNIST dataset
    • Data of over 60,000 training images of handwritten digts

We will use TensorFlow to train a model from this dataset

Example 1: TensorFlow (2)

  • Go to EX1 directory
  • Look at tf-example.py
    • This example uses TF to train from the MINIST data

Normally, to run this job, we will run

module load python
python3 tf-example.py

IT DOESN’T WORK!!! Need tensorflow installed!!!

  • You can install it your yourself (via pip/conda maybe?)
    • Maybe errors with building
    • Have to build again using another computer
  • Instead of installing it yourself, let is find a container!

Example 1: TensorFlow (3)

Interactive

  • Start an interactive session
qrsh -l h_data=10G
  • Load the apptainer module
module load apptainer
  • Pull the TF container from DockerHub
apptainer pull docker://tensorflow/tensorflow:2.7.1
  • We see a file named, tensorflow_2.7.1.sif
    • This SIF file is the container
    • This container is a OS with python and TF installed inside
  • Start an interactive shell INSIDE the container
apptainer shell tensorflow_2.7.1.sif
python3 tf-example.py
  • Now we are in the container, we can run python with TF!
python3 tf-example.py

Tip

  • See that we didn’t need to load any python module!
  • We didn’t need to install any TF packages ourselves!!
  • Everything is inside the container!

Example 1: TensorFlow (4)

Batch

  • Going interactively inside the container (Previous slide)
    • apptainer shell [container.sif]
  • Run a single command in the container
    • apptainer exec [container.sif] [command]
qrsh -l h_data=10G
module load apptainer
apptainer pull docker://tensorflow/tensorflow:2.7.1
apptainer exec tensorflow_2.7.1.sif python3 tf-example.py

Alternatively, you can submit this as a batch job

  • Example job script: tf-example.job
qsub tf-example.job

Example 2: GPUs with PyTorch (1) 🎆

  • This example uses PyTorch with GPU support for faster speed 🚀
    • Another great Machine Learning framework
  • Go to the EX2 directory
    • Examine the pytorch_gpu.py file
    • Optimize a 3rd order polynomial to a sine function
  • To run this example, we’ll need to find a container with GPU support!

Example 2: GPU job (2)

Let’s run python3 tf-example.py on a GPU node

  • Start an interactive session with a GPU compute node
qrsh -l h_data=10G,gpu,V100
  • Download the PyTorch container from Nvidia NGC
module load apptainer
apptainer pull docker://nvcr.io/nvidia/pytorch:22.03-py3
  • Run apptainer with the --nv option.
    • This enables the container to use the host’s GPU drivers
apptainer shell --nv pytorch_22.03-py3 
python -c "import tensorflow as tf; print('GPU is available' if tf.test.is_gpu_available() else 'GPU is NOT available')"
  • Run python3 as a single command
apptainer exec --nv tensorflow_2.7.1.sif python3 tf-example.py

Alternatively, you can submit this as a batch job using a job script

qsub pytorch_gpu.job

Example 3: Parallel MPI containers 🌐

In this example, we’ll run a parallel MPI container using NWChem, a popular computational chemistry application.

Many applications use MPI to run across multiple CPUs, and NWChem is one of them.

  • On Hoffman2, a NWChem container with MPI has already been built
    • $H2_CONTAINER_LOC/h2-nwchem_7.0.2.sif

Typically, we will run NWChem like this:

module load intel/2022.1.1
module load nwchem/7.0.2
`which mpirun` nwchem water.nw > water.out

To run inside the container:

  • Load the intel module
    • Sets up (INTEL)MPI on the host (outside the container)
  • Add mpirun in front of apptainer exec
module load intel/2022.1.1
`which mpirun` apptainer exec $H2_CONTAINER_LOC/h2-nwchem_7.0.2.sif nwchem water.nw  > water.out

A example batch job is located in EX3/nwchem.job

qsub nwchem.job

Considerations and Best Practices

  • 📦 Size of container
    • Keep it small and minimal
    • Include only necessary components for your applications
    • Large containers need more memory and take longer to start up
  • 👥 Share .sif files with your friends!
    • 🔧 Experiment creating your containers
    • Save your (Docker) containers to DockerHub or GitHub Packages
    • Find examples of Dockerfiles and Apptainer def files on our GitHub

Thank you!

Questions? Comments? 🤔

Charles Peterson